-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make flake in configMap update e2e easier to debug #21573
Conversation
Labelling this PR as size/XS |
LGTM |
b53584c
to
b311c9e
Compare
@@ -181,7 +181,7 @@ var _ = Describe("ConfigMap", func() { | |||
// Kubelet projects the update into the volume and the container picks | |||
// it up. This timeout is based on the default Kubelet sync period (1 | |||
// minute) plus additional time for fudge factor. | |||
const podLogTimeout = 90 * time.Second | |||
const podLogTimeout = 150 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that my timeout in the original test was too optimistic. It may take a full sync interval in the kubelet to pick up the change; bumping way up to avoid future timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess there's no way to programatically make the sync happen sooner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to link this directly to the sync interval, so it floats against it?
Also, it should be ≤2*interval
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is an update that needs to be processed by a pod worker, we shouldn't go fine with timeouts. The kubelet needs to wake up and invoke the worker. You can probably engineer a timeout that matches pretty fine with when a pod worker is woken up. But if a worker is already doing work, or many workers are up and competing for cpu, there's very little you can do to guarantee it does work exactly on some edge of the kubelet sync.
Employ the same strategy as we do for pod readiness? (poll till 5m)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polling for 5 min sounds legit -- does that mean this should move to [Slow]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are too many tests that poll for 5m in gce normal. I think we can catch if this just keeps crossing 5m via the test dashboard, I don't think any of the tests that currently poll for 5m actually do. If one of the generic timesout in e2e/util are applicable in this case, just use that and we'll bump all at the same time if we do?
GCE e2e test build/test passed for commit b53584c0fab3b4e0445545bb7c827a8b1b25d75b. |
PR changed after LGTM, removing LGTM. |
Labelling this PR as size/S |
b311c9e
to
f8d58ac
Compare
@bprashanth @ihmccreery bumped timeout to 5m per @bprashanth's idea. |
GCE e2e test build/test passed for commit b311c9e95ea6de8a10be8532d48d626a08965df3. |
If this consistenly takes 5m it's a bug we should address. Otherwise I'm fine with another bug saying "investigate config map slowness" etc (maybe filed on yourself? ;) and bumping it. It will validate the theory that this is in fact a sluggish podworker, and we can catch it if it keeps exceeding 5m. |
GCE e2e build/test failed for commit f8d58ac. |
GCE e2e build/test failed for commit f8d58ac. |
GCE e2e test build/test passed for commit f8d58ac. |
@k8s-oncall trap is green so I'm hand merging. This is clarity and timeout bump. |
Make flake in configMap update e2e easier to debug
Help with #21244
@ihmccreery @ixdy